Price County
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.04)
- North America > Canada > Alberta > Census Division No. 11 > Sturgeon County (0.04)
- (6 more...)
- Law > Litigation (1.00)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
LexEval: A Comprehensive Chinese Legal Benchmark for Evaluating Large Language Models
Li, Haitao, Chen, You, Ai, Qingyao, Wu, Yueyue, Zhang, Ruizhe, Liu, Yiqun
Large language models (LLMs) have made significant progress in natural language processing tasks and demonstrate considerable potential in the legal domain. However, legal applications demand high standards of accuracy, reliability, and fairness. Applying existing LLMs to legal systems without careful evaluation of their potential and limitations could pose significant risks in legal practice. To this end, we introduce a standardized comprehensive Chinese legal benchmark LexEval. This benchmark is notable in the following three aspects: (1) Ability Modeling: We propose a new taxonomy of legal cognitive abilities to organize different tasks. (2) Scale: To our knowledge, LexEval is currently the largest Chinese legal evaluation dataset, comprising 23 tasks and 14,150 questions. (3) Data: we utilize formatted existing datasets, exam datasets and newly annotated datasets by legal experts to comprehensively evaluate the various capabilities of LLMs. LexEval not only focuses on the ability of LLMs to apply fundamental legal knowledge but also dedicates efforts to examining the ethical issues involved in their application. We evaluated 38 open-source and commercial LLMs and obtained some interesting findings. The experiments and findings offer valuable insights into the challenges and potential solutions for developing Chinese legal systems and LLM evaluation pipelines. The LexEval dataset and leaderboard are publicly available at \url{https://github.com/CSHaitao/LexEval} and will be continuously updated.
- North America > United States > New York > New York County > New York City (0.04)
- Asia > China > Beijing > Beijing (0.04)
- North America > Canada > Alberta > Census Division No. 13 > Westlock County (0.04)
- (7 more...)
- Law > Litigation (1.00)
- Law > Criminal Law (1.00)
- Law Enforcement & Public Safety > Crime Prevention & Enforcement (1.00)
- (2 more...)
LawBench: Benchmarking Legal Knowledge of Large Language Models
Fei, Zhiwei, Shen, Xiaoyu, Zhu, Dawei, Zhou, Fengzhe, Han, Zhuo, Zhang, Songyang, Chen, Kai, Shen, Zongwen, Ge, Jidong
Large language models (LLMs) have demonstrated strong capabilities in various aspects. However, when applying them to the highly specialized, safe-critical legal domain, it is unclear how much legal knowledge they possess and whether they can reliably perform legal-related tasks. To address this gap, we propose a comprehensive evaluation benchmark LawBench. LawBench has been meticulously crafted to have precise assessment of the LLMs' legal capabilities from three cognitive levels: (1) Legal knowledge memorization: whether LLMs can memorize needed legal concepts, articles and facts; (2) Legal knowledge understanding: whether LLMs can comprehend entities, events and relationships within legal text; (3) Legal knowledge applying: whether LLMs can properly utilize their legal knowledge and make necessary reasoning steps to solve realistic legal tasks. LawBench contains 20 diverse tasks covering 5 task types: single-label classification (SLC), multi-label classification (MLC), regression, extraction and generation. We perform extensive evaluations of 51 LLMs on LawBench, including 20 multilingual LLMs, 22 Chinese-oriented LLMs and 9 legal specific LLMs. The results show that GPT-4 remains the best-performing LLM in the legal domain, surpassing the others by a significant margin. While fine-tuning LLMs on legal specific text brings certain improvements, we are still a long way from obtaining usable and reliable LLMs in legal tasks. All data, model predictions and evaluation code are released in https://github.com/open-compass/LawBench/. We hope this benchmark provides in-depth understanding of the LLMs' domain-specified capabilities and speed up the development of LLMs in the legal domain.
- Law > Criminal Law (1.00)
- Law > Litigation (0.93)
- Government > Regional Government (0.93)
Discrimination in the Age of Algorithms
Kleinberg, Jon, Ludwig, Jens, Mullainathan, Sendhil, Sunstein, Cass R.
But the ambiguity of human decision-making often makes it extraordinarily hard for the legal system to know whether anyone has actually discriminated. To understand how algorithms affect discrimination, we must therefore also understand how they affect the problem of detecting discrimination. By one measure, algorithms are fundamentally opaque, not just cognitively but even mathematically. Yet for the task of proving discrimination, processes involving algorithms can provide crucial forms of transparency that are otherwise unavailable. These benefits do not happen automatically. But with appropriate requirements in place, the use of algorithms will make it possible to more easily examine and interrogate the entire decision process, thereby making it far easier to know whether discrimination has occurred. By forcing a new level of specificity, the use of algorithms also highlights, and makes transparent, central tradeoffs among competing values. Algorithms are not only a threat to be regulated; with the right safeguards in place, they have the potential to be a positive force for equity.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Texas > Tarrant County > Fort Worth (0.04)
- North America > United States > Wisconsin > Price County (0.04)
- (11 more...)
- Law > Labor & Employment Law (1.00)
- Law > Criminal Law (1.00)
- Law > Civil Rights & Constitutional Law (1.00)
- (7 more...)
Shedding Light on Black Box Machine Learning Algorithms: Development of an Axiomatic Framework to Assess the Quality of Methods that Explain Individual Predictions
From self-driving vehicles and back-flipping robots to virtual assistants who book our next appointment at the hair salon or at that restaurant for dinner - machine learning systems are becoming increasingly ubiquitous. The main reason for this is that these methods boast remarkable predictive capabilities. However, most of these models remain black boxes, meaning that it is very challenging for humans to follow and understand their intricate inner workings. Consequently, interpretability has suffered under this ever-increasing complexity of machine learning models. Especially with regards to new regulations, such as the General Data Protection Regulation (GDPR), the necessity for plausibility and verifiability of predictions made by these black boxes is indispensable. Driven by the needs of industry and practice, the research community has recognised this interpretability problem and focussed on developing a growing number of so-called explanation methods over the past few years. These methods explain individual predictions made by black box machine learning models and help to recover some of the lost interpretability. With the proliferation of these explanation methods, it is, however, often unclear, which explanation method offers a higher explanation quality, or is generally better-suited for the situation at hand. In this thesis, we thus propose an axiomatic framework, which allows comparing the quality of different explanation methods amongst each other. Through experimental validation, we find that the developed framework is useful to assess the explanation quality of different explanation methods and reach conclusions that are consistent with independent research.
- North America > United States > Wisconsin > Price County (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- Europe > Germany > Baden-Württemberg > Karlsruhe Region > Karlsruhe (0.04)
- Research Report > New Finding (1.00)
- Overview (0.92)
- Law (1.00)
- Information Technology > Security & Privacy (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- (6 more...)